Skip to content

ITEP-84336: Fix incorrect camera pose for models generated with VGGT#1139

Open
daddo-intel wants to merge 32 commits intomainfrom
fix/ITEP-84336-incorrect-cam-pose-vggt
Open

ITEP-84336: Fix incorrect camera pose for models generated with VGGT#1139
daddo-intel wants to merge 32 commits intomainfrom
fix/ITEP-84336-incorrect-cam-pose-vggt

Conversation

@daddo-intel
Copy link
Contributor

@daddo-intel daddo-intel commented Mar 5, 2026

📝 Description

VGGT outputs camera poses and depth in scale-ambiguous units, which cased:

  1. Incorrect mesh dimenstions
  2. incorrect pixels_per_meter in renderTopView
  3. Inconsistent scale compared to MapAnything

This PR implements automatic metric scale for VGGT using known camera poses provided to` _processOutputs() and applies scale factor to:

  1. Camera translations
  2. world_points_from_depth
  3. world_points

✨ Type of Change

Select the type of change your PR introduces:

  • 🐞 Bug fix – Non-breaking change which fixes an issue
  • 🚀 New feature – Non-breaking change which adds functionality
  • 🔨 Refactor – Non-breaking change which refactors the code base
  • 💥 Breaking change – Changes that break existing functionality
  • 📚 Documentation update
  • 🔒 Security update
  • 🧪 Tests
  • 🚂 CI

🧪 Testing Scenarios

Describe how the changes were tested and how reviewers can test them too:

  • ✅ Tested manually
  • 🤖 Ran automated end-to-end tests

✅ Checklist

Before submitting the PR, ensure the following:

  • 🔍 PR title is clear and descriptive
  • 📝 For internal contributors: If applicable, include the JIRA ticket number (e.g., ITEP-123456) in the PR title. Do not include full URLs
  • 💬 I have commented my code, especially in hard-to-understand areas
  • 📄 I have made corresponding changes to the documentation
  • ✅ I have added tests that prove my fix is effective or my feature works

@daddo-intel daddo-intel marked this pull request as ready for review March 5, 2026 00:44
@daddo-intel daddo-intel requested a review from saratpoluri March 5, 2026 12:13
@daddo-intel daddo-intel requested a review from ltalarcz March 5, 2026 12:14

def wait_for_result(api_url: str, request_id: str, verify_ssl: bool, timeout_s: int = 600, poll_s: float = 2.0):
def wait_for_result(api_url: str, request_id: str, verify_ssl: bool,
timeout_s: int = 15 * 60, poll_s: float = 1.5):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead calculating 15 * 60 we can just set it to 900?

files.append(("camera_ids", (None, camera_id)))
cam_loc = camera_loc_by_id.get(camera_id)
if cam_loc is not None:
cam_loc = camera_loc_by_id.get(camera_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant if statement


for camera in cameras:
cam_id = camera.sensor_id
camera_order.append(cam_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camera_order.append(camera.sensor_id)

cam_loc = None
if camera_locations and idx < len(camera_locations):
try:
cam_loc = json.loads(camera_locations[idx])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should log that camera_locations[idx] is not a valid json. Because on runtime we wouldn't know if something was provided but invalid or just empty and everything is fine

# Median is robust to occasional bad pose entries
return float(np.median(distances))

except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log at least what exception occurred?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants